Challenges You Will Face When Parsing PDFs with Python
theseattledataguy.comยท2hยท
Discuss: Hacker News
๐Ÿ“„PDF Archaeology
Weasel words and co.: Guide to recognising AI-generated texts on Wikipedia
heise.deยท1h
๐Ÿ“œBinary Philology
Preserving the digital legacy of company archives: Last stop, Newhaven.
dpconline.orgยท9h
๐Ÿ’พData Preservation
The Rise of Semantic Entity Resolution
towardsdatascience.comยท1d
๐Ÿ“„Semantic Chunking
Semantic Dictionary Encoding
falvotech.comยท3hยท
Discuss: Hacker News
๐ŸŒ€Brotli Dictionary
Ancient Scripts, Modern AI: Bridging the Divide with Morphology-Aware Tokenization by Arvind Sundararajan
dev.toยท1dยท
Discuss: DEV
๐Ÿ“Concrete Syntax
AI models are struggling to identify hate speech, study finds
the-independent.comยท1h
๐Ÿ“ฐContent Curation
WorldCat Editions and Holdings Release
annas-archive.orgยท1dยท
Discuss: Hacker News
๐Ÿ“šMARC Records
UTF-8 Is Beautiful
hackaday.comยท12h
๐Ÿ”ฃUnicode
Sindhi Halchal Archive: Building on the PG Sindhi Library
digitalorientalist.comยท3d
๐ŸŒWeb Archiving
Satyajit Das: On Reading โ€“ Textual Pleasures
nakedcapitalism.comยท2d
๐Ÿ“•Bookbinding
Show HN: Semlib โ€“ Semantic Data Processing
github.comยท3hยท
Discuss: Hacker News
๐ŸŒณIncremental Parsing
'Publish or perish' evolutionary pressures shape scientific publishing, for better and worse
phys.orgยท1h
๐Ÿ“ŠCitation Graphs
Lessons from using AI in Discovery
thoughtbot.comยท17h
๐Ÿ•ต๏ธMetadata Mining
Call for Submissions: Public Services Quarterly
archivespublishing.comยท3h
๐Ÿ“šLibrary and Information Science
Digital Forensics Jobs Round-Up, September 15 2025
forensicfocus.comยท2h
๐ŸšจIncident Response
A key type of AI training data is running out. Googlers have a bold new idea to fix that.
businessinsider.comยท1h
๐Ÿ”Vector Forensics
Listening to Unreliable Narrators
secondvoice.substack.comยท6hยท
Discuss: Substack
๐ŸฐManuscript Networks
LLM Rerankers for RAG: A Practical Guide
fin.aiยท20hยท
๐Ÿ”Information Retrieval